Lecture 06

Statistics 422

Mapping in R

Setup

library(readr)
library(tidyverse)
library(sf)
library(ggspatial)
library(knitr)
library(maptiles) # extra maps
library(tidyterra) # extra geoms
library(maps)
library(leaflet)
options(dplyr.width = Inf, dplyr.print_max = 1e9,  width = 144)

What we need to start

  • Some data
  • With coordinates
dispensary <- read_csv("https://raw.githubusercontent.com/lewv/S24STATS101A/main/data/dispensary.csv")
glimpse(dispensary) 
Rows: 688
Columns: 9
$ YEAR_START       <dbl> 2007, 2024, 2022, 2006, 2018, 2023, 2024, 2021, 2024, 2023, 2018, 2022, 2024, 2006, 2023, 2018, 2018, 2024, 2007, 202…
$ `BUSINESS NAME`  <chr> "CANNATOPIA GARDENS", "STANLEY ALLEY HOLDINGS LLC", "SAN PEDRO ISH LLC", "THE VAULT WOODLAND HILLS", "CLARMONTI CONSU…
$ `DBA NAME`       <chr> "CANNA SYLMAR", NA, "GOAT GLOBAL SOUTH LA", "MOTHER NATURE'S REMEDY", NA, "LA BREA COLLECTIVE", NA, "OFF THE CHARTS",…
$ `STREET ADDRESS` <chr> "13509 HUBBARD STREET", "6218 LANKERSHIM BLVD", "2522  S CENTRAL AVENUE    #2534", "22815   VENTURA BLVD", "945  E 10…
$ CITY             <chr> "SYLMAR", "NORTH HOLLYWOOD", "LOS ANGELES", "WOODLAND HILLS", "LOS ANGELES", "LOS ANGELES", "NORTH HILLS", "LOS ANGEL…
$ ZIP_CODE         <dbl> 91342, 91606, 90011, 91364, 90021, 90019, 91343, 90038, 90015, 91342, 90021, 91311, 91601, 90012, 90001, 90021, 90023…
$ longitude        <dbl> -118.4274, -118.3857, -118.2544, -118.6240, -118.2472, -118.3478, -118.4946, -118.3408, -118.2543, -118.4152, -118.24…
$ latitude         <dbl> 34.3099, 34.1839, 34.0190, 34.1659, 34.0333, 34.0482, 34.2222, 34.0836, 34.0342, 34.2927, 34.0257, 34.2457, 34.1605, …
$ COUNCIL_DISTRICT <dbl> 7, 2, 9, 3, 14, 10, 12, 5, 14, 7, 14, 12, 2, 14, 9, 14, 14, 8, 6, 13, 2, 4, 13, 14, 8, 5, 10, 9, 14, 14, 3, 14, 14, 7…

Always check

kable(head(dispensary))
YEAR_START BUSINESS NAME DBA NAME STREET ADDRESS CITY ZIP_CODE longitude latitude COUNCIL_DISTRICT
2007 CANNATOPIA GARDENS CANNA SYLMAR 13509 HUBBARD STREET SYLMAR 91342 -118.4274 34.3099 7
2024 STANLEY ALLEY HOLDINGS LLC NA 6218 LANKERSHIM BLVD NORTH HOLLYWOOD 91606 -118.3857 34.1839 2
2022 SAN PEDRO ISH LLC GOAT GLOBAL SOUTH LA 2522 S CENTRAL AVENUE #2534 LOS ANGELES 90011 -118.2544 34.0190 9
2006 THE VAULT WOODLAND HILLS MOTHER NATURE’S REMEDY 22815 VENTURA BLVD WOODLAND HILLS 91364 -118.6240 34.1659 3
2018 CLARMONTI CONSULTING LLC NA 945 E 10TH STREET # LOS ANGELES 90021 -118.2472 34.0333 14
2023 PEACE AND JOY LLC LA BREA COLLECTIVE 5057 W PICO BLVD LOS ANGELES 90019 -118.3478 34.0482 10

Some light data management

dispensary <- dispensary %>% 
  mutate(COUNCIL_DISTRICT = as.factor(COUNCIL_DISTRICT),
         YEAR_CLASS = cut(YEAR_START, c(-Inf, 2015, 2018, 2021, Inf),
                          labels = c("Pre-2016", "2016-2018", "2019-2021", "2022-present")))
  
# check
dispensary %>% group_by(YEAR_START, YEAR_CLASS) %>% tally()
# A tibble: 21 × 3
# Groups:   YEAR_START [21]
   YEAR_START YEAR_CLASS       n
        <dbl> <fct>        <int>
 1       2000 Pre-2016         1
 2       2005 Pre-2016         5
 3       2006 Pre-2016        70
 4       2007 Pre-2016        76
 5       2008 Pre-2016         1
 6       2009 Pre-2016         3
 7       2010 Pre-2016         1
 8       2011 Pre-2016         7
 9       2012 Pre-2016         4
10       2013 Pre-2016         1
11       2014 Pre-2016         7
12       2015 Pre-2016         1
13       2016 2016-2018        3
14       2017 2016-2018        2
15       2018 2016-2018      231
16       2019 2019-2021        6
17       2020 2019-2021        9
18       2021 2019-2021       56
19       2022 2022-present   121
20       2023 2022-present    75
21       2024 2022-present     8

Simplest - use geom_point

The simplest type of map is a scatterplot with XY points plotted.

ggplot(dispensary, 
       aes(x = longitude, y = latitude, color =  COUNCIL_DISTRICT)) +
         geom_point() +
         theme_classic() +
         theme(legend.position = "none")

geom_point with shape too

ggplot(dispensary, 
       aes(x = longitude, y = latitude, color =  YEAR_CLASS)) +
       geom_point(aes(shape=YEAR_CLASS), size=2) +  
       scale_color_manual(values = c("red","green","cyan", "black")) +
       theme_classic() 

The sf package in R

  • This package has tools for geographic data operations.
  • It integrates with tidyverse easily (e.g., ggplot, lubridate)
  • Uses the “simple features” standard for storing and manipulating spatial data.
  • It can represent geometric objects like points, lines, polygons, and more complex objects like multipoints, multilines, and multipolygons.
  • See Wikipedia for more information about the simple features standard

Some technical background about sf in R

  • An sf object is a data frame where each row represents a spatial feature (such as a geographic location or area), and each column contains the attributes associated with that feature.
  • One of the columns is a special list-column that contains geometry data in a format understood by the sf package
  • The geometries in sf objects include POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON and can represent almost any kind of spatial data.
  • The sf package is designed work with dplyr, ggplot2, and other tidyverse packages
  • The sf package allows import/export to other spatial data formats,

Example - Convert a data frame with point coordinates

# Convert the data frame to an sf object
points_sf <- st_as_sf(dispensary, 
                      coords = c("longitude", "latitude"), 
                      crs = 4326,
                      remove = FALSE)
  • st_as_sf() converts a non-spatial feature object to spatial feature
  • for point data, need the columns with the coordinates (x, y)
  • crs is the coordinate reference sysem
  • remove = FALSE keep the original variables

What is a crs = 4326?

  • CRS means “Coordinate Reference System”. The CRS defines how our 2D map relates to the corresponding locations on Earth.
  • CRS states how the data is to be projected and how to interpret the coordinate system used in the data. It helps to align spatial data from different sources
  • 4326 refers to a particular (there are many) CRS — WGS 84 (World Geodetic System 1984): This is the standard coordinate system used by the Global Positioning System (GPS)
  • Knowing which CRS matters when we bring in additional spatial information

Examine its structure

str(points_sf)
sf [688 × 11] (S3: sf/tbl_df/tbl/data.frame)
 $ YEAR_START      : num [1:688] 2007 2024 2022 2006 2018 ...
 $ BUSINESS NAME   : chr [1:688] "CANNATOPIA GARDENS" "STANLEY ALLEY HOLDINGS LLC" "SAN PEDRO ISH LLC" "THE VAULT WOODLAND HILLS" ...
 $ DBA NAME        : chr [1:688] "CANNA SYLMAR" NA "GOAT GLOBAL SOUTH LA" "MOTHER NATURE'S REMEDY" ...
 $ STREET ADDRESS  : chr [1:688] "13509 HUBBARD STREET" "6218 LANKERSHIM BLVD" "2522  S CENTRAL AVENUE    #2534" "22815   VENTURA BLVD" ...
 $ CITY            : chr [1:688] "SYLMAR" "NORTH HOLLYWOOD" "LOS ANGELES" "WOODLAND HILLS" ...
 $ ZIP_CODE        : num [1:688] 91342 91606 90011 91364 90021 ...
 $ longitude       : num [1:688] -118 -118 -118 -119 -118 ...
 $ latitude        : num [1:688] 34.3 34.2 34 34.2 34 ...
 $ COUNCIL_DISTRICT: Factor w/ 16 levels "0","1","2","3",..: 8 3 10 4 15 11 13 6 15 8 ...
 $ YEAR_CLASS      : Factor w/ 4 levels "Pre-2016","2016-2018",..: 1 4 4 1 2 4 4 3 4 4 ...
 $ geometry        :sfc_POINT of length 688; first list element:  'XY' num [1:2] -118.4 34.3
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:10] "YEAR_START" "BUSINESS NAME" "DBA NAME" "STREET ADDRESS" ...

still ggplot, still ugly

dist_extent <- st_bbox(points_sf[points_sf$COUNCIL_DISTRICT == 5,])
dist5 <- points_sf[points_sf$COUNCIL_DISTRICT == 5,]

buffer_size <- 0.01  

# Expand the bounding box by the buffer size
expanded_extent <- c(
  xmin = dist_extent$xmin - buffer_size,
  xmax = dist_extent$xmax + buffer_size,
  ymin = dist_extent$ymin - buffer_size,
  ymax = dist_extent$ymax + buffer_size
)

ggplot() +
  geom_sf(data = dist5) +
  geom_sf_label(data = dist5, aes(label = `BUSINESS NAME`), size = 2) + 
  coord_sf(xlim = c(expanded_extent[1], expanded_extent[2]), 
           ylim = c(expanded_extent[3], expanded_extent[4]),
           expand = FALSE) +
  theme_dark() +
  theme(legend.position = "none")

Load a basemap (polygons)

We don’t have a good reference for our points, they are just points or worse, labels. A basemap can provide administrative boundaries, roads etc. This one, from the Los Angeles Times, is a geojson file of neighborhood boundaries.

basemap <- st_read("../../../LA_Times_Neighborhood_Boundaries.geojson")
Reading layer `7a495245-19b6-4449-b5b5-024a29431bbe2020328-1-walab9.vpxce' from data source 
  `/Users/vivian/Desktop/SP24STATS422/LA_Times_Neighborhood_Boundaries.geojson' using driver `GeoJSON'
Simple feature collection with 114 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -118.6681 ymin: 33.70467 xmax: -118.1554 ymax: 34.33731
Geodetic CRS:  WGS 84

Examine the structure

str() can be helpful when you are trying understand new R objects

str(basemap)
Classes 'sf' and 'data.frame':  114 obs. of  3 variables:
 $ OBJECTID: int  1 2 3 4 5 6 7 8 9 10 ...
 $ name    : chr  "Adams-Normandie" "Arleta" "Arlington Heights" "Atwater Village" ...
 $ geometry:sfc_MULTIPOLYGON of length 114; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:14, 1:2] -118 -118 -118 -118 -118 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
  ..- attr(*, "names")= chr [1:2] "OBJECTID" "name"

Basemap

We can plot polygons as easily as points

ggplot() +
  geom_sf(data = basemap) +
  theme_void()  

Basemap modifies easily

With knowledge about the structure of the basemap, we can modify it using dplyr or base R functions:

westside <- basemap %>% filter(name %in% c("Westwood", "West Los Angeles", "Sawtelle", 
"Palms", "Mar Vista", "Cheviot Hills", "Century City", "Rancho Park"))

ggplot(westside) +
  geom_sf(data = westside, aes(fill = as.factor(name))) +
  geom_sf_text(aes(label = name)) +
  theme(legend.position = "none")

Apply it

ggplot(data = points_sf) +
  geom_sf(data = basemap) +
  geom_sf(aes(color = YEAR_CLASS), alpha = 0.5) +
  labs(title = "Map of Los Angeles Dispenaries") +
  theme_minimal()

Also

westside_extent <- st_bbox(westside)

ggplot(data = dist5) +
  geom_sf(data = westside, aes(fill = as.factor(name))) +
  geom_sf_label(data = westside, aes(label = name), size = 2, color = "black", 
fill = "NA") +
  geom_sf_label(data = dist5, aes(label = `BUSINESS NAME`), size = 2, color = "black", fill = "green") + 
  coord_sf(xlim = c(westside_extent$xmin, westside_extent$xmax-0.003), 
           ylim = c(westside_extent$ymin, westside_extent$ymax),
           expand = FALSE) +
  labs(title = "Map of West Los Angeles Dispenaries") +
  theme_void()  +
  theme(legend.position = "none")

Finding basemaps

You can find these all over the internet. Be forewarned, some can be quite large and complicated.

City of LA
County of LA
State of California
US Census
NTS
Arcgis

Other basemaps libraries

library(maptiles) # extra maps
library(tidyterra) # extra geoms

basemap <- get_tiles(points_sf, provider = "OpenStreetMap")

Open Street Map

ggplot(data = points_sf) +
  geom_spatraster_rgb(data = basemap) +
  geom_sf(data = points_sf, color = "red", alpha = 0.2) +
  labs(title = "Map of Los Angeles Dispenaries") +
  #coord_sf(xlim = c(-118.8, -118), ylim = c(33.6, 34.4)) + 
  theme_void()

Maps are layers… order matters

This will just cover the points, some throw errors

ggplot(data = points_sf) +
  geom_sf(data = points_sf, color = "red", alpha = 0.2) +
  geom_spatraster_rgb(data = basemap) +
  labs(title = "Map of Los Angeles Dispenaries layers switched") +
  theme_void()

The Shapefile

ESRI + Shapefile

  • ESRI (Environmental Systems Research Institute) dominates inGIS software and solutions - environmental management, local government, utilities, transportation

  • ESRI is used for mapping, spatial analysis and also decision-making with geographic data..

  • The Shapefile a data format for GIS software. It is developed and regulated by ESRI it stores the location, shape, and attributes of geographic features in a map.

Inside a shapefile

  • A shapefile is actually a folder/directory consisting of at minimum three files.

  • All 3 must be present for the shapefile to be complete and functional:

    • .shp — the file that stores the geometric location and shape of all features.
    • .shx — the shape index format; a positional index of the feature geometry to allow for seeking forward quickly to the geometry of a feature in the .shp file.
    • .dbf — an attribute table for each shape, in dBase IV (1979) format.

Shapefiles - boundaries

Commonly seen as boundaries (but could have only points or lines)

Downloaded a shapefile for the 50 US states (notice the crs is different)

st_layers("../../../cb_2018_us_state_20m")
Driver: ESRI Shapefile 
Available layers:
            layer_name geometry_type features fields crs_name
1 cb_2018_us_state_20m       Polygon       52      9    NAD83

Shapefiles - st_read()

Actually transform it into an R object:

US_STATE <- st_read("../../../cb_2018_us_state_20m", layer = "cb_2018_us_state_20m")
Reading layer `cb_2018_us_state_20m' from data source `/Users/vivian/Desktop/SP24STATS422/cb_2018_us_state_20m' using driver `ESRI Shapefile'
Simple feature collection with 52 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS:  NAD83

Shapefiles - str()

See the contents

str(US_STATE)
Classes 'sf' and 'data.frame':  52 obs. of  10 variables:
 $ STATEFP : chr  "24" "19" "10" "39" ...
 $ STATENS : chr  "01714934" "01779785" "01779781" "01085497" ...
 $ AFFGEOID: chr  "0400000US24" "0400000US19" "0400000US10" "0400000US39" ...
 $ GEOID   : chr  "24" "19" "10" "39" ...
 $ STUSPS  : chr  "MD" "IA" "DE" "OH" ...
 $ NAME    : chr  "Maryland" "Iowa" "Delaware" "Ohio" ...
 $ LSAD    : chr  "00" "00" "00" "00" ...
 $ ALAND   : num  2.52e+10 1.45e+11 5.05e+09 1.06e+11 1.16e+11 ...
 $ AWATER  : num  6.98e+09 1.08e+09 1.40e+09 1.03e+10 3.39e+09 ...
 $ geometry:sfc_MULTIPOLYGON of length 52; first list element: List of 2
  ..$ :List of 1
  .. ..$ : num [1:6, 1:2] -76 -76 -76 -76 -76 ...
  ..$ :List of 1
  .. ..$ : num [1:261, 1:2] -79.5 -79.5 -79.5 -79.4 -79 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:9] "STATEFP" "STATENS" "AFFGEOID" "GEOID" ...

Plot

Adding a basemap is optional (but easy)

basemap <- get_tiles(US_STATE, provider = "OpenStreetMap")

use ggplot

ggplot(data = US_STATE) +
  geom_spatraster_rgb(data = basemap) +
  geom_sf(color = "blue", fill = "cornflowerblue") + 
  coord_sf(xlim = c(-171.791110603, -66.96466), ylim = c(18.91619, 71.3577635769)) + 
  labs(title = "US MAP") +
  theme(legend.position = "none")

Bounding Boxes

Sometimes you need to know the limits around a geography:

https://gist.github.com/graydon/11198540

I needed limits for the USA map because Guam was included but I thought the extent was too large

Lines

Roads (in this example trails) are just lines as we see in this shapefile. This originated from the National Parks Service or the National Geological Survey and are publicly available

st_layers("../../../TRAN_Wyoming_State_Shape")
Driver: ESRI Shapefile 
Available layers:
           layer_name geometry_type features fields crs_name
1 Trans_AirportRunway       Polygon      129     16    NAD83
2  Trans_AirportPoint         Point      248     14    NAD83
3   Trans_RailFeature   Line String     2317     16    NAD83
4  Trans_TrailSegment   Line String     5910     29    NAD83
5   Trans_RoadSegment   Line String   302213     26    NAD83
6  Meta_ProcessDetail            NA      103     15     <NA>
7  Meta_DatasetDetail            NA       39     26     <NA>
8 BPFeatureToMetadata            NA   393665      3     <NA>

Reading only the Trails

WY_TRAIL <- st_read("../../../TRAN_Wyoming_State_Shape", layer = "Trans_TrailSegment")
Reading layer `Trans_TrailSegment' from data source `/Users/vivian/Desktop/SP24STATS422/TRAN_Wyoming_State_Shape' using driver `ESRI Shapefile'
Simple feature collection with 5910 features and 29 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: -111.0545 ymin: 40.99478 xmax: -104.0532 ymax: 45.00479
Geodetic CRS:  NAD83

Check its structure

str(WY_TRAIL)
Classes 'sf' and 'data.frame':  5910 obs. of  30 variables:
 $ permanenti: chr  "bf1004df-5f00-4e36-965c-35a82e4ac3e9" "07c816f6-cc60-48f9-98f1-c906f7158909" "5d6d71c5-fe6f-491f-b6b9-28b55d011914" "221bb537-8975-4f7e-bba8-1151f697feb5" ...
 $ name      : chr  "South Boundary Trail: Grassy Lake-South Entrance" "Cement 1601" "Cook Lake" "Sundance  East Fork Quarry" ...
 $ namealtern: chr  NA NA NA NA ...
 $ trailnumbe: chr  NA NA "206" "206" ...
 $ trailnum_1: chr  NA "206" NA NA ...
 $ sourcefeat: chr  "Yellowstone National Park" NA NA NA ...
 $ sourcedata: chr  "{A2B9A873-ACAC-4D1C-B014-5B090D31938D}" "{94D89690-09E1-44DB-AC11-0ECB4C5D1C13}" "{94D89690-09E1-44DB-AC11-0ECB4C5D1C13}" "{94D89690-09E1-44DB-AC11-0ECB4C5D1C13}" ...
 $ sourceda_1: chr  "NPS Trails 10/2019" "USFS Trails 12/2017" "USFS Trails 12/2017" "USFS Trails 12/2017" ...
 $ sourceorig: chr  "National Park Service" "U.S. Forest Service" "U.S. Forest Service" "U.S. Forest Service" ...
 $ loaddate  : Date, format: "2020-01-06" "2019-12-09" "2019-12-09" "2020-01-06" ...
 $ trailtype : chr  "Terra Trail" "Terra Trail" "Terra Trail" "Terra Trail" ...
 $ hikerpedes: chr  "Y" "Y" "Y" "Y" ...
 $ bicycle   : chr  "N" NA NA NA ...
 $ packsaddle: chr  "N" NA NA NA ...
 $ atv       : chr  "N" NA NA NA ...
 $ motorcycle: chr  "N" NA NA NA ...
 $ ohvover50i: chr  "N" NA NA NA ...
 $ snowshoe  : chr  "N" NA NA NA ...
 $ crosscount: chr  "N" NA NA NA ...
 $ dogsled   : chr  "N" NA NA NA ...
 $ snowmobile: chr  "N" NA NA NA ...
 $ nonmotoriz: chr  NA NA NA NA ...
 $ motorizedw: chr  NA NA NA NA ...
 $ primarytra: chr  "NPS" "FS" "FS" "FS" ...
 $ nationaltr: chr  NA NA NA NA ...
 $ lengthmile: num  6.5384 0.124 0.544 0.2844 0.0493 ...
 $ networklen: num  10.83 2.46 128.35 128.35 128.35 ...
 $ SHAPE_Leng: num  NA NA NA NA NA NA NA NA NA NA ...
 $ ObjectID  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ geometry  :sfc_MULTILINESTRING of length 5910; first list element: List of 1
  ..$ : num [1:170, 1:2] -111 -111 -111 -111 -111 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTILINESTRING" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:29] "permanenti" "name" "namealtern" "trailnumbe" ...

Filter using dplyr

sf just works so well with tidyverse, you don’t need to think too hard about the programming aspect, instead, be the researcher/cartographer/data scientist etc.

Yellowstone <- WY_TRAIL %>% filter(sourcefeat == "Yellowstone National Park")

Obtain a basemap that fits the map we have extracted

basemap <- get_tiles(Yellowstone, provider = "OpenStreetMap")

Put it all together

ggplot(data = Yellowstone) +
  geom_spatraster_rgb(data = basemap) +
  geom_sf() + 
  labs(title = "Yellowstone Trails") +
  theme(legend.position = "none")

Geocoding

Point with addresses only

Geocoding means providing geographical coordinates corresponding to a location.

Example: I have a file of In n Out locations with no lat/lon:

library(dplyr, warn.conflicts = FALSE)

innout <- read_csv("https://raw.githubusercontent.com/lewv/S24STATS101A/main/data/in_n_out.csv")
head(innout)
# A tibble: 6 × 5
  name                      address               city      state   zip
  <chr>                     <chr>                 <chr>     <chr> <dbl>
1 IRVINE                    4115 CAMPUS DRIVE     IRVINE    CA    92612
2 CARLSBAD                  5950 AVENIDA ENCINAS  CARLSBAD  CA    92008
3 ORANGE (TUSTIN & LINCOLN) 2585 N. TUSTIN ST.    ORANGE    CA    92865
4 MURRIETA                  39697 AVENIDA ACACIAS MURRIETA  CA    92563
5 CORONA (AUTO CENTER DR.)  450 AUTO CENTER DR.   CORONA    CA    92882
6 VACAVILLE                 170 NUT TREE PKWY     VACAVILLE CA    95687

Check

Geocoding preparation

We are using the package “tidygeocoder” here which is free but imperfect.

If we were doing this for a high stakes project, we might use Google as our geocoder.

In either case, the method is the same, the addresses must be prepared

library(tidygeocoder)

innout <-innout %>% filter(state == "CA"  & substr(innout$zip, 1, 3) < 919 & zip != 91761)
innout <- innout %>% mutate(address2 = paste(innout$address, innout$city, innout$state, innout$zip, sep =", "))

head(innout)
# A tibble: 6 × 6
  name                        address                 city          state   zip address2                                       
  <chr>                       <chr>                   <chr>         <chr> <dbl> <chr>                                          
1 COVINA                      1371 GRAND AVE.         COVINA        CA    91724 1371 GRAND AVE., COVINA, CA, 91724             
2 WESTCHESTER                 9149 S. SEPULVEDA BLVD. LOS ANGELES   CA    90045 9149 S. SEPULVEDA BLVD., LOS ANGELES, CA, 90045
3 REDONDO BEACH               3801 INGLEWOOD AVE      REDONDO BEACH CA    90278 3801 INGLEWOOD AVE, REDONDO BEACH, CA, 90278   
4 TORRANCE (CARSON ST.)       730 W. CARSON           TORRANCE      CA    90502 730 W. CARSON, TORRANCE, CA, 90502             
5 PORTER RANCH                19901 RINALDI STREET    PORTER RANCH  CA    91326 19901 RINALDI STREET, PORTER RANCH, CA, 91326  
6 GLENDALE (BRAND & BROADWAY) 119 S. BRAND BLVD.      GLENDALE      CA    91204 119 S. BRAND BLVD., GLENDALE, CA, 91204        

Geocoding

The addresses are passed to a geocoding service

innout_lat_longs <- innout %>%
  geocode(address2, method = 'osm', lat = latitude , long = longitude)
innout_lat_longs <- innout_lat_longs %>% na.omit()
name address city state zip address2 latitude longitude
COVINA 1371 GRAND AVE. COVINA CA 91724 1371 GRAND AVE., COVINA, CA, 91724 34.06338 -117.8684
WESTCHESTER 9149 S. SEPULVEDA BLVD. LOS ANGELES CA 90045 9149 S. SEPULVEDA BLVD., LOS ANGELES, CA, 90045 33.95371 -118.3968
REDONDO BEACH 3801 INGLEWOOD AVE REDONDO BEACH CA 90278 3801 INGLEWOOD AVE, REDONDO BEACH, CA, 90278 33.89210 -118.3618
TORRANCE (CARSON ST.) 730 W. CARSON TORRANCE CA 90502 730 W. CARSON, TORRANCE, CA, 90502 33.83114 -118.2884
PORTER RANCH 19901 RINALDI STREET PORTER RANCH CA 91326 19901 RINALDI STREET, PORTER RANCH, CA, 91326 34.27587 -118.5680
GLENDALE (BRAND & BROADWAY) 119 S. BRAND BLVD. GLENDALE CA 91204 119 S. BRAND BLVD., GLENDALE, CA, 91204 34.14558 -118.2553

Result

Now we can map it after we convert it to a spatial feature:

points_sf <- st_as_sf(innout_lat_longs, 
                      coords = c("longitude", "latitude"), 
                      crs = 4326,
                      remove = FALSE)
                      
basemap <- get_tiles(points_sf, provider = "OpenStreetMap", zoom = 9)

ggplot(data = points_sf) +
  geom_spatraster_rgb(data = basemap) +
  geom_sf(data = points_sf, shape = 21, fill = "yellow", color = "black", size = 3) +
  labs(title = "Map of In N Out Locations") +
  coord_sf(xlim = c(-118.9, -117.5), ylim = c(33.6, 34.5)) + 
  theme_minimal()

GeoJSON files

  • Saw one earlier (neighborhoods from the LA Times)
  • GeoJSON is a format for geographic data using JSON (JavaScript Object Notation).
  • Developed for internet use, GeoJSON is human readable, can be very large, can also contain data related to the spatial structures.
  • It is open standard format and used heavily in web applications

Point example

schools <- st_read("../../../Schools_Colleges_and_Universities.geojson")
Reading layer `Schools_Colleges_and_Universities' from data source `/Users/vivian/Desktop/SP24STATS422/Schools_Colleges_and_Universities.geojson' using driver `GeoJSON'
Simple feature collection with 3214 features and 19 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6305234 ymin: 1581822 xmax: 6652590 ymax: 2111962
Projected CRS: NAD83 / California zone 5 (ftUS)

Problem it is using a non-standard CRS. We convert:

schools_wgs84 <- st_transform(schools, st_crs(westside))  
st_crs(schools_wgs84)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

Now we can combine and plot

p <- ggplot(westside) +
  geom_sf(data = westside, aes(fill = as.factor(name))) +
  geom_sf_text(aes(label = name)) +
  geom_sf(data = schools_wgs84) +
  coord_sf(xlim = c(westside_extent$xmin, westside_extent$xmax), 
           ylim = c(westside_extent$ymin, westside_extent$ymax),
           expand = FALSE) +
  theme(legend.position = "none")

p

Point in Polygon

An operation in which points from one spatial feature dataset are overlaid on the polygons of another spatial feature dataset to determine which points are contained within the polygons.

In our situation, we have schools and we have neighbors and cities. We will use a different shapefile for the County of LA

st_layers("../../../County")
Driver: ESRI Shapefile 
Available layers:
                                  layer_name geometry_type features fields                         crs_name
1 City_and_Unincorporated_Boundaries_(Legal)       Polygon      347     16 NAD83 / California zone 5 (ftUS)

Reading the shapefile

county <- st_read("../../../County", layer = "City_and_Unincorporated_Boundaries_(Legal)")
Reading layer `City_and_Unincorporated_Boundaries_(Legal)' from data source `/Users/vivian/Desktop/SP24STATS422/County' using driver `ESRI Shapefile'
Simple feature collection with 347 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 6275509 ymin: 1385758 xmax: 6668481 ymax: 2122085
Projected CRS: NAD83 / California zone 5 (ftUS)

Matching CRS

The CRS for the county map is not the same as the schools, so we can transform one of them. Always check too.

county_wgs84 <- st_transform(county, st_crs(schools_wgs84))  
st_crs(county_wgs84)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

Finding the points in polygon

  • The function st_intersects will overly the schools on the county map and the intersections are noted.

  • Using lengths() from base R, we just measure the length of the vectors

  • I chose to log() transform the counts

  • Lastly, write them back to the polygons file

intersections <- st_intersects(county_wgs84, schools_wgs84)
point_counts <- lengths(intersections)
county_wgs84$point_count <- log(point_counts+1)

(Choropleth) Map the result

The only trick is the aesthetic where fill = point_count

ggplot(county_wgs84) +
  geom_sf(aes(fill = point_count)) +  # Fill based on point counts
  coord_sf(xlim = c(-119, -117.6), 
           ylim = c(33.68, 34.8),
           expand = FALSE) +
  scale_fill_viridis_b() 

Choropleth maps

Definition

  • A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed
  • Provides easy visualization of measurement variation across a geographic area
  • Typically uses administrative boundaries like countries, states, counties, or districts.
  • Typically darker shades often represent higher values, and lighter shades represent lower values.
  • The data should be standardized (adjusted for the size of the population or area).

Reading the shapefile: US Counties

US_COUNTY <- st_read("../../../cb_2018_us_county_20m", layer = "cb_2018_us_county_20m")
Reading layer `cb_2018_us_county_20m' from data source `/Users/vivian/Desktop/SP24STATS422/cb_2018_us_county_20m' using driver `ESRI Shapefile'
Simple feature collection with 3220 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS:  NAD83

Check the map

ggplot(data = US_COUNTY) +
  geom_sf(color = "black", fill = "white") + 
  coord_sf(xlim = c(-172.791110603, -66.96466), ylim = c(18.91619, 71.3577635769)) + 
  labs(title = "US County Map") +
  theme(legend.position = "none")

Obtain Data & Join

Spatial Data using sf creates simple data frame type of spatial data

So it handles the same way

poverty <- read_csv("https://raw.githubusercontent.com/lewv/S24STATS101A/main/data/poverty_us_2022.csv")  
COUNTY_pov <- inner_join(US_COUNTY, poverty)

Map to a ggplot object

Save your ggplot for further processing

p <- ggplot(data = COUNTY_pov) +
  geom_sf(aes(fill = Pct_Below_Poverty)) + 
  coord_sf(xlim = c(-172.791110603, -66.96466), ylim = c(18.91619, 71.3577635769)) +
  scale_fill_gradient2() +
  labs(title = "US County Poverty Map", fill = "% below poverty") +
  theme_classic()
p

Adding Interactivity

Use the plotly library to add quick interactivity to your map:

library(plotly)
ggplotly(p, tooltip = c("fill"))

Leaflet

About

Leaflet is used for creating interactive maps. Many websites use it to create maps for reporting purposes. This document serves as an brief introduction to some of its features.

The package leaflet must be installed. We call it up with the library function. It is best to use the “pipe” operator |> to pass data as the function calls can become quite complex.

The basic steps used in creating a map in leaflet:

  1. Create a map widget by calling leaflet().
  2. Add layers to the map by using layer functions to modify the map widget.
  3. Repeat step 2 as many times as you like
  4. Print the map widget to display it.

Basic Map

Here is a very basic map centered on the Stat Department. m is the map widget. We use addTiles() to add a base map then we use addMarkers to identify UCLA. Then we print the map.

#library(leaflet)
m <- leaflet() |>
  addTiles() |>
  addMarkers(lng = -118.442506,
             lat = 34.069649,
             popup = "UCLA")
m  # Print the map

Map libraries in R - State

We can incorporate the maps library available in R. We can overlay a map of the United States. We extract the US map and save it to a new object called mapStates. A new map widget is created and the state boundaries are shaded.

#library(maps)
mapStates <- map("state", fill = TRUE, plot = FALSE)
leaflet(data = mapStates) |>
  addTiles() |>
  addPolygons(fillColor = topo.colors(10, alpha = NULL), stroke = FALSE)

Map libraries in R - County

mapCounties <- map("county", fill = TRUE, plot = FALSE)
leaflet(data = mapCounties) |>
  addTiles() |>
  addPolygons(fillColor = topo.colors(10, alpha = NULL), stroke = FALSE)

Other Places

Just to show you that we can map other nations.

mapCyprus <- map("world", "cyprus", fill = TRUE, plot = FALSE)
leaflet(data = mapCyprus) |>
  addTiles() |>
  addMarkers(lng = 33.370985, 35.174318, popup = "Famagusta Gate")

Easy Basemaps

We have more choices for a basemap. You can use addWMSTiles() to add WMS (Web Map Service) tiles. The map below shows the Base Reflectivity (a measure of the intensity of precipitation) using the WMS from the Iowa Environmental Mesonet:

leaflet() |>
  addTiles() |>
  setView(lng = -98.5795,
          lat = 39.8282,
          zoom = 4) |>
  addWMSTiles(
    "http://mesonet.agron.iastate.edu/cgi-bin/wms/nexrad/n0r.cgi",
    layers = "nexrad-n0r-900913",
    options = WMSTileOptions(format = "image/png", transparent = TRUE),
    attribution = "Weather data © 2012 IEM Nexrad"
  )

The weather map is current but would be updated every time we access the weather web site.

Using our other data

We have come full circle

basemap <- st_read("../../../LA_Times_Neighborhood_Boundaries.geojson")
Reading layer `7a495245-19b6-4449-b5b5-024a29431bbe2020328-1-walab9.vpxce' from data source 
  `/Users/vivian/Desktop/SP24STATS422/LA_Times_Neighborhood_Boundaries.geojson' using driver `GeoJSON'
Simple feature collection with 114 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -118.6681 ymin: 33.70467 xmax: -118.1554 ymax: 34.33731
Geodetic CRS:  WGS 84
innout_lat_longs <- read_csv("https://raw.githubusercontent.com/lewv/S24STATS101A/main/data/innout_lat_longs.csv")
leaflet(data=basemap) %>%
  addTiles() %>%
  addPolygons(fillColor = "cyan", 
              weight = 4, 
              smoothFactor = 0.5) %>%
  addCircleMarkers(data = innout_lat_longs, 
                   ~longitude, ~latitude,
                   popup = ~name, 
                   radius = 5, 
                   fillColor = "red",
                   fillOpacity = 0.8,
                   color = "yellow",
                   weight = 1)

Closing

  • Unlimited possibilities with maps
  • Many of the parts are flexible and interchangeable